Improvements and Generalizations of Stochastic Knapsack and Multi-Armed Bandit Algorithms: Extended Abstract

نویسنده

  • Will Ma
چکیده

The celebrated multi-armed bandit (MAB) problem, originating from the work of Gittins et al. [GGW89], presumes a condition on the arms called the martingale assumption. Recently, A. Gupta et al. obtained an LP-based 1 48 -approximation for the problem with the martingale assumption removed [GKMR11]. We improve the algorithm to a 4 27 -approximation, with simpler analysis. Our algorithm also generalizes to the case of MAB superprocesses with (stochastic) multi-period actions. This generalization captures the explore-exploit budgeted learning framework introduced by Guha and Munagala [GM07a, GM07b]. Also, we obtain a tight ( 12 − ε)-approximation for the variant where preemption (playing an arm, switching to another arm, then coming back to the first arm) is not allowed. This contains the stochastic knapsack problem of Dean, Goemans, and Vondrák [DGV08] with correlated rewards, for both the cancellation and no cancellation cases, improving the 1 16 and 1 8 approximations of [GKMR11], respectively. Our algorithm samples probabilities from an exponential-sized dynamic programming solution, whose existence is guaranteed by an LP projection argument. We hope this technique can also be applied to other dynamic programming problems which can be projected down onto a small LP. ∗willma353@gmail.com, Operations Research Center, Massachusetts Institute of Technology. Supported in part by the NSERC PGS-D Award, NSF grant CCF-1115849, and ONR grants N00014-11-1-0053 and N00014-11-1-0056.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvements and Generalizations of Stochastic Knapsack and Multi-Armed Bandit Approximation Algorithms: Full Version

The multi-armed bandit (MAB) problem features the classical tradeoff between exploration and exploitation. The input specifies several stochastic arms which evolve with each pull, and the goal is to maximize the expected reward after a fixed budget of pulls. The celebrated work of Gittins et al. [GGW89] presumes a condition on the arms called the martingale assumption. Recently, A. Gupta et al....

متن کامل

Time-Constrained Restless Bandits and the Knapsack Problem for Perishable Items (Extended Abstract)

Motivated by a food promotion problem, we introduce the Knapsack Problem for Perishable Items (KPPI) to address a dynamic problem of optimally filling a knapsack with items that disappear randomly. The KPPI naturally bridges the gap and elucidates the relation between the pspace-hard restless bandit problem and the np-hard knapsack problem. Our main result is a problem decomposition method resu...

متن کامل

Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem

This paper is devoted to regret lower bounds in the classical model of stochastic multiarmed bandit. A well-known result of Lai and Robbins, which has then been extended by Burnetas and Katehakis, has established the presence of a logarithmic bound for all consistent policies. We relax the notion of consistence, and exhibit a generalisation of the logarithmic bound. We also show the non existen...

متن کامل

Lower bounds and selectivity of weak-consistent policies in stochastic multi-armed bandit problem

This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit. A well-known result of Lai and Robbins, which has then been extended by Burnetas and Katehakis, has established the presence of a logarithmic bound for all consistent policies. We relax the notion of consistency, and exhibit a generalisation of the bound. We also study the existence of logarith...

متن کامل

Multi-armed Bandit Problem with Lock-up Periods

We investigate a stochastic multi-armed bandit problem in which the forecaster’s choice is restricted. In this problem, rounds are divided into lock-up periods and the forecaster must select the same arm throughout a period. While there has been much work on finding optimal algorithms for the stochastic multi-armed bandit problem, their use under restricted conditions is not obvious. We extend ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013